- Introduction
- Methods
- Project overview
- Data
- Results
- Discussion
Spring 2020
Prediction of protein-protein interactions (PPI) are a challenging task.
ML models allow to exploit the content of these PPI data sets.
The aim of this project is to create a toolbox to predict the biological activity of these peptides with machine learning models.
| Function | Library |
|---|---|
| Data loading | readxl |
| Data cleaning and wrangling | dplyr , broom (tidyverse) |
| Data augmenting | dplyr (tidyverse),Peptides |
| Extracting data | UniprotR |
| Plotting | ggplot2(tidyverse), ggseqlogo,ggpubr |
| Analysing | stats |
| Modeling | keras,neuralnet, caret, yardstick, glmnet,ANN2 |
| Protein | Target | Biological activity | Species | Num of variants | Score | |
|---|---|---|---|---|---|---|
| Data set 1 | BRCA1 | BARD1 RING domain | Ubiquitin E3 activity | H. sapiens | 5610 | Y2H assays |
| Data set 2 | ERK2 | Small molecule (SCH772984) | Resistance to drugs | H. sapiens | 6810 | Drug sensitivity assays. Calculation of cell availability |
| Data set 3 | LDLRAP1 | OBFC1 | Protein translation | H. sapiens | 6385 | Y2H assays |
| Data set 4 | Pab1 | el4FG1 | Translation initiation | S. cereviseae | 1340 | Y2H assays |
Ideas for supported machine learning framework: